BigDFT.RemoteRunners module

This module wraps the classes and functions you will need if you want to run PyBigDFT operations on a remote machine.

class RemoteRunner(function=None, submitter='bash', name=None, url=None, skip=False, asynchronous=True, remote_dir='.', rsync='', local_dir='.', script='#!/bin/bash\n', python='python', arguments=None, protocol='JSON', extra_encoder_functions=None, required_files=None, output_files=None, **kwargs)[source]

This class can be used to run python functions on a remote machine. This class combines the execution of a script with a python remote function.

Parameters
  • function (func) – the python function to be serialized. The return value of the function should be an object for which a relatively light-weight serialization is possible.

  • name (str) – the name of the remote function. Function name is employed if not provided.

  • script (str) – The script to be executed provided in string form. The result file of the script is assumed to be named <name>-function-result.

  • submitter (str) – the interpreter to be invoked. Should be, e.g. bash if the script is a shell script, or qsub if this is a submission script.

  • url (str, URL) – either a user@host string or URL() connection class (preferable)

  • remote_dir (str) – the path to the work directory on the remote machine. Should be associated to a writable directory.

  • skip (bool) – if true, we perform a lazy calculation.

  • asynchronous (bool) – If True, submit the calculation without waiting for the results.

  • local_dir (str) – local directory to prepare the IO files to send. The directory should exist and write permission should be granted.

  • python (str) – python interpreter to be invoked in the script.

  • protocol (str) – serialization method to be invoked for the function. can be ‘JSON’ or ‘Dill’ or ‘JSONPickle’, depending of the desired version.

  • extra_encoder_functions (list)) – list of dictionaries of the format {‘cls’: Class, ‘func’: function} which is employed to serialize non-instrinsic objects as well as non-numpy objects. Useful for the ‘JSON’ protocol.

  • required_files (list) – list of extra files that may be required for the good running of the function.

  • output_files (list) – list of the files that the function will produce that are supposed to be retrieved to the host computer.

  • arguments (dict) – keyword arguments of the function. The arguments of the function should be serializable without the requirement of too much disk space. Such arguments cannot be named in the same way as the others.

  • **kwargs (dict) – Further keyword arguments of the script, which will be substituted in the string representation.

RemoteRunner exists to allow transfer of generic python functions to a remote system. Functions will be serialised via the user specified format and transferred over along with their arguments.

Consider a simple function:

>>> def test_func(arg):
>>>     return arg

This function can be run on a remote machine using RemoteRunner, as shown in the following example:

Lets assume we have access to a remote machine with the ip address

192.168.0.0

First, construct a URL connection to this machine using the URL module

>>> from BigDFT.URL import URL
>>> url = URL(user='user', host='192.168.0.0')

Now give this, and the function as an argument to the RemoteRunner, along with any arguments required:

>>> from BigDFT.RemoteRunners import RemoteRunner
>>> remote_run = RemoteRunner(function = test_func,
>>>                           url = url,
>>>                           arguments={'arg':'This is a test argument'})

Run your code with the run() method, then results can be retreived using the fetch_results() method. Whether or not a run has finished can be determined used the is_finished() method.

>>> remote_run.run()
>>> import time
>>> # while we don't have a finished run, wait
>>> while not remote_run.is_finished():
>>>     time.sleep(1)
>>> result = remote_run.fetch_result()
>>> print(result)
>>> 'This is a test argument'

You now have the basic ingredients to begin running generic functions on remote machines. There is obviously more to these methods, but see their documentation for further details.

pre_processing()[source]

Ensure protected arguments, and gather the files to send.

process_run(files)[source]

Run the calculations.

Most skip/force/async logic is evaluated here.

post_processing(files, status)[source]

Fetch the results for finished runs if async. All runs if not.

class RemoteDataset(label='RemoteDataset', run_dir='/tmp', database_file='database.yaml', force=False, **kwargs)[source]

Defines a set of remote runs, to be executed from a base script and to a provided url. This class is associated to a set of remote submissions, which may contain multiple calculations. All those calculations are expressed to a single url, with a single base script, but with a collection of multiple remote runners that may provide different arguments.

Parameters
  • label (str) – man label of the dataset.

  • run_dir (str) – local directory of preparation of the data.

  • database_file (str) – name of the database file to keep track of the submitted runs.

  • force (str) – force the execution of the dataset regardless of the database status.

  • **kwargs – global arguments of the appended remote runners.

append_run(id, remote_runner=None, **kwargs)[source]

Add a remote run into the dataset.

Append to the list of runs to be performed the corresponding runner and

the arguments which are associated to it.

Parameters
  • id (dict) – the id of the run, useful to identify the run in the dataset. It has to be a dictionary as it may contain different keyword. For example a run might be classified as id = {'hgrid':0.35, 'crmult': 5}.

  • remote_runner (RemoteRunner) – a instance of a remote runner that will be employed.

  • **kwargs – arguments required for the creation of the corresponding remote runner. If remote_runner is provided, these arguments will be They will be combined with the global arguments.

Raises

ValueError – if the provided id is identical to another previously appended run.

pre_processing()[source]

Setup datasets prior to run. Gather and send data to the runners

For each appended runner: Register them within the database, collect the files to be sent then send the files to the run directory

process_run()[source]

Run the dataset, by performing explicit run of each of the item of the runs_list.

is_finished(anyfile=True, verbose=False, timeout=- 1)[source]

Returns all() of is_finished methods of each runner present

Parameters
  • anyfile (bool) – Checks for file recency if False

  • verbose (bool) – Will not print checking status if False

  • timeout (int) – Number of times each is_finished call can fail before raising an error. -1 to disable (Default)

Returns

{irun: finished}

Return type

dict

all_finished(anyfile=True, verbose=False, timeout=- 1)[source]

Returns all() of is_finished methods of each runner present

Parameters
  • anyfile (bool) – Checks for file recency if False

  • verbose (bool) – Will not print checking status if False

  • timeout (int) – Number of times each is_finished call can fail before raising an error. -1 to disable (Default)

Returns

True if all runs have finished

Return type

bool

fetch_results(id=None)[source]

Retrieve some attribute from some of the results.

Selects out of the results the objects which have in their id at least the dictionary specified as input. May return an attribute of each result if needed.

Parameters

id (dict) – dictionary of the retrieved id. Return a list of the runs that have the id argument inside the provided id in the order provided by append_run(). If absent, then the entire list of runs is returned.

class RunsDatabase(database_file)[source]

Contains the list of runs which have been submitted.

Parameters

database_file (str) – name of the file used to store the data

exists(name)[source]

Checks if a name exists in the database.

register(name)[source]

Include a name in the database.

clean()[source]

Remove the database information.

computer_runner(func, submission_script, url=None, validate: bool = True, **kwargs)[source]

Create a runner based on a computer information.

A function is transformed into a remote runner from the specification of a computer.

Parameters
  • func (func) – function to be transformed

  • submission_script (BigDFT.RemoteRunnerUtils.CallableAttrDict) – specification dictionary of the computer. Should contain the attribute submitter.

  • url (BigDFT.URL.URL) – The url of the computer

  • validate (bool, optional) – Validate input parameters if True

  • **kwargs – arguments of the RemoteRunner

Returns

a new remote runner ready for the computer.

Return type

RemoteRunner

computer_script(commands, url, **kwargs)[source]

Design a script to be executed remotely.

This function provides a RemoteScript instance which can be used to run remotely a particular script on the machine.

Parameters
  • commands (BigDFT.RemoteRunnerUtils.CallableAttrDict) – specification dictionary of the script. Must contain the ‘prefix’ attribute.

  • **kwargs – arguments of the RemoteRunner function. Default choices are made for the positional arguments unless otherwise specified.

Returns

the instance of the

Remote Script.

Return type

BigDFT.RemoteRunnerUtils.RemoteScript